An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Deep learning-based 3D object detectors have made significant progress in recent years and have been deployed in a wide range of applications. It is crucial to understand the robustness of detectors against adversarial attacks when employing detectors in security-critical applications. In this paper, we make the first attempt to conduct a thorough evaluation and analysis of the robustness of 3D detectors under adversarial attacks. Specifically, we first extend three kinds of adversarial attacks to the 3D object detection task to benchmark the robustness of state-of-the-art 3D object detectors against attacks on KITTI and Waymo datasets, subsequently followed by the analysis of the relationship between robustness and properties of detectors. Then, we explore the transferability of cross-model, cross-task, and cross-data attacks. We finally conduct comprehensive experiments of defense for 3D detectors, demonstrating that simple transformations like flipping are of little help in improving robustness when the strategy of transformation imposed on input point cloud data is exposed to attackers. Our findings will facilitate investigations in understanding and defending the adversarial attacks against 3D object detectors to advance this field.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one. We present a diffusion-based captioning model, dubbed the name DDCap, to allow more decoding flexibility. Unlike image generation, where the output is continuous and redundant with a fixed length, texts in image captions are categorical and short with varied lengths. Therefore, naively applying the discrete diffusion model to text decoding does not work well, as shown in our experiments. To address the performance gap, we propose several key techniques including best-first inference, concentrated attention mask, text length prediction, and image-free training. On COCO without additional caption pre-training, it achieves a CIDEr score of 117.8, which is +5.0 higher than the auto-regressive baseline with the same architecture in the controlled setting. It also performs +26.8 higher CIDEr score than the auto-regressive baseline (230.3 v.s.203.5) on a caption infilling task. With 4M vision-language pre-training images and the base-sized model, we reach a CIDEr score of 125.1 on COCO, which is competitive to the best well-developed auto-regressive frameworks. The code is available at https://github.com/buxiangzhiren/DDCap.
translated by 谷歌翻译
卷积神经网络(CNN)通过深度体系结构获得了出色的性能。但是,这些CNN在复杂的场景下通常对图像超分辨率(SR)实现较差的鲁棒性。在本文中,我们通过利用不同类型的结构信息来获得高质量图像,提出了异质组SR CNN(HGSRCNN)。具体而言,HGSRCNN的每个异质组块(HGB)都采用含有对称组卷积块和互补的卷积块的异质体系结构,并以平行方式增强不同渠道的内部和外部关系,以促进富裕类型的较富裕类型的信息, 。为了防止出现获得的冗余功能,以串行方式具有信号增强功能的完善块旨在过滤无用的信息。为了防止原始信息的丢失,多级增强机制指导CNN获得对称架构,以促进HGSRCNN的表达能力。此外,开发了一种平行的向上采样机制来训练盲目的SR模型。广泛的实验表明,在定量和定性分析方面,提出的HGSRCNN获得了出色的SR性能。可以在https://github.com/hellloxiaotian/hgsrcnn上访问代码。
translated by 谷歌翻译
搜索会话中的上下文信息对于捕获用户的搜索意图很重要。已经提出了各种方法来对用户行为序列进行建模,以改善会话中的文档排名。通常,(搜索上下文,文档)对的训练样本在每个训练时期随机采样。实际上,了解用户的搜索意图和判断文档的相关性的困难从一个搜索上下文到另一个搜索上下文有很大差异。混合不同困难的训练样本可能会使模型的优化过程感到困惑。在这项工作中,我们为上下文感知文档排名提出了一个课程学习框架,其中排名模型以易于恐惧的方式学习搜索上下文和候选文档之间的匹配信号。这样一来,我们旨在将模型逐渐指向全球最佳。为了利用正面和负面示例,设计了两个课程。两个真实查询日志数据集的实验表明,我们提出的框架可以显着提高几种现有方法的性能,从而证明课程学习对上下文感知文档排名的有效性。
translated by 谷歌翻译
消除偏见的同时保留所有与任务相关的信息对于公平表示学习方法具有挑战性,因为它们会产生随机或退化表示w.r.t.当敏感属性与标签相关时,标记。现有的作品提议将标签信息注入学习程序以克服此类问题。但是,并不总是满足观察到的标签是清洁的假设。实际上,标签偏见被认为是引起歧视的主要来源。换句话说,公平的预处理方法忽略了在学习过程或评估阶段中标签中编码的歧视。这一矛盾给了学识渊博的表示的公平性。为了避免此问题,我们探讨了以下问题:\ emph {我们可以学习可预测的公平表示,可预测到仅访问不可靠标签的潜在理想公平标签吗?}在这项工作中,我们建议\ textbf {d} e- \ textbf { \ textbf {r} \ textbf {f} ernenses(dbrf)框架的b} iased \ textbf {r} ePresentation学习,该框架将敏感信息从非敏感属性中解散,同时使学习的表示形式可预测到理想的公平标签,而不是观察到的偏见。我们通过信息理论概念(例如相互信息和信息瓶颈)制定了偏见的学习框架。核心概念是,当敏感信息受益于不可靠标签的预测时,DBRF提倡不使用不可靠的标签进行监督。综合数据和现实世界数据的实验结果表明,DBRF有效地学习了对理想标签的偏见表示。
translated by 谷歌翻译
由遮挡,信号丢失或手动注释错误引起的3D边界框的地面真相注释的固有歧义可能会使训练过程中的深3D对象检测器混淆,从而使检测准确性恶化。但是,现有方法在某种程度上忽略了此类问题,并将标签视为确定性。在本文中,我们提出了GLENET,这是一个从条件变异自动编码器改编的生成标签不确定性估计框架,以建模典型的3D对象与其潜在的潜在基边界框之间具有潜在变量的一对一关系。 Glenet产生的标签不确定性是一个插件模块,可以方便地集成到现有的深3D检测器中,以构建概率检测器并监督本地化不确定性的学习。此外,我们提出了概率探测器中的不确定性质量估计量架构,以指导对IOU分支的培训,并预测了本地化不确定性。我们将提出的方法纳入各种流行的3D检测器中,并观察到它们的性能显着提高到Waymo Open DataSet和Kitti数据集中的当前最新技术。
translated by 谷歌翻译
最近,优化衍生的学习(ODL)吸引了学习和视觉领域的关注,该学习和视觉领域从优化的角度设计了学习模型。但是,以前的ODL方法将训练和超训练程序视为两个分离的阶段,这意味着在训练过程中必须固定超训练变量,因此也不可能同时获得训练和超级培训的收敛性训练变量。在这项工作中,我们将基于定点迭代的广义Krasnoselkii-Mann(GKM)计划设计为我们的基本ODL模块,该模块将现有的ODL方法统一为特殊情况。在GKM方案下,构建了双级元优化(BMO)算法框架,以共同解决最佳训练和超训练变量。我们严格地证明了训练定点迭代的基本关节融合以及优化超训练的超训练的过程,无论是在近似质量方面还是在固定分析上。实验证明了BMO在稀疏编码和现实世界中的竞争性能的效率,例如图像反卷积和降雨的删除。
translated by 谷歌翻译
自我监督学习的一个重要目标是使模型预训练能够从几乎无限的数据中受益。但是,一种最近变得流行的方法,即掩盖图像建模(MIM),被怀疑无法从较大的数据中受益。在这项工作中,我们通过广泛的实验打破了这一误解,数据量表从10 \%imagenet-1k到完整的Imagenet-22K,型号的尺寸从4,900万到10亿,培训长度从125k迭代到500k迭代迭代范围不等。我们的研究表明:(i)蒙版的图像建模也要求对较大的数据进行要求。我们观察到,非常大的模型被相对较小的数据过度。 (ii)培训的时间长度。接受掩盖图像建模训练的大型模型可以从更多的数据中受益,并具有更长的培训。 (iii)预训练中的验证损失是衡量模型在多个任务上进行微调的表现的好指标。该观察结果使我们能够预先评估预训练的模型,而无需对下游任务进行昂贵的试用和错误评估。我们希望我们的发现能够从缩放能力方面提高对蒙版图像建模的理解。
translated by 谷歌翻译